Mitigating Economic Losses Through Flood Predictions: Analyzing the Impact of Affected Populations on Economic Damages

Author

SID: 540305506 | University of Sydney | DATA1001 | October 2024

1. Client Bio and Recommendation

1.1 Client: IFRC - Flood Resilience Program

Bio: IFRC - Flood Resilience Program focuses on improving flood resilience in vulnerable communities across Asia, Africa, and other regions. Their initiatives include disaster preparedness, community education, and policy recommendations to mitigate flood impacts.

1.2 Recommendation

The main goal of the program is to reduce the number of people impacted by floods and minimize economic losses. This report demonstrates a moderate correlation between the number of affected people and economic damages, indicating that reducing the number of people affected by floods not only protects human lives but also helps reduce economic losses, especially in Asia. While many other factors can influence economic damage from floods, reducing the number of affected people remains a key factor. Through appropriate measures such as early warning systems, floodplain zoning, and the protection of vulnerable communities, the economic damage from future natural disasters can be minimized if the impact of floods on people is mitigated effectively.

2. Evidence

Code
# Load necessary libraries
library(tidyverse)

# Set options to avoid scientific notation in plots
options(scipen = 999)

# 1. Load Data
# The dataset 'natural-disasters.csv' contains information on natural disasters, including floods.
# Select the relevant columns for analysis.
raw_data <- read.csv("natural-disasters.csv")

# 2. Data Wrangling
# Extract the necessary columns related to floods and economic damage
data <- raw_data[, c("Entity", "Year", "Number.of.people.affected.by.floods", "Total.economic.damages.from.floods")]

# Clean the data by omitting rows with missing values
# This step ensures that the dataset is ready for analysis and free from NA values
clean_data <- na.omit(data)

To demonstrate evidence-based decision-making, we analyse data related to the correlation between the number of people affected by floods and total economic damages.

2.1 Relationship between Number of People Affected and Total Economic Damages

Code
# Compute the correlation between the two variables
correlation = cor(clean_data$Number.of.people.affected.by.floods, clean_data$Total.economic.damages.from.floods)

# Rounding the value to 2dp
round(correlation, 2)
[1] 0.74
Code
# Loading necessary libraries
library(ggplot2)
library(plotly)
# Produce a regression line
p = ggplot(clean_data, aes(x = Number.of.people.affected.by.floods , y =Total.economic.damages.from.floods)) +
  geom_point(color ="red") + # Setting the points to red for scatscatterplot
  geom_smooth(method = "lm", color = "blue", se =FALSE) +
  labs(title = "Figure 2a. Relationship Between Number 
               of People Affected and Economic Damages",
       x = "Number of people affected",
       y = "Total economic damages (USD)") +
  scale_x_continuous(labels = scales::comma) + # Format x-axis with commas
  scale_y_continuous(labels = scales::comma) + # Format y-axis with commas
  theme_minimal()
# Make it interactive
l = ggplotly(p)
l

Correlation: 0.74
Linear equation: y = 0.1565255x + 85843.659821

The correlation coefficient of 0.74 indicates a moderate linear relationship between the two variables. The scatterplot with the regression line shows as the number of people affected by floods increased, total economic damages from floods also increased. For instance, for each additional person affected by floods, economic damages are predicted to rise by approximately $0.1565 USD.

Code
# Fit the linear model
model = lm(Number.of.people.affected.by.floods ~ Total.economic.damages.from.floods, data = clean_data)
# Loading necessary libraries
library(ggplot2)
library(plotly)
# Create residual plot
p = ggplot(model, aes(x = .fitted, y = .resid)) + # Plotting fitted values and residuals
  geom_point(color ="red") + # Setting the points to red for scatterplot
  geom_hline(yintercept = 0, linetype = "dashed", colour = "blue") + # Horizontal line at 0
  labs(title = "Figure 2b. Residual plot of Number 
              of People Affected and Economic Damages",
       x = "Fitted values",
       y = "Residual values") +
  scale_x_continuous(labels=scales::comma) + #Format x-axis with commas
  scale_y_continuous(labels = scales::comma) #Format y-axis with commas
#Make the plot interactive
l = ggplotly(p)
l

However, the uneven distribution around the regression line and the presence of outliers suggest that this model does not sufficiently explain all the factors affecting total economic damages.

2.2 Hypothesis Testing

A hypothesis-testing framework is used to examine the relationship between the Number of people affected and Total economic damages in order to strengthen the evidence of a linear correlation between the two variables.

Null Hypothesis (H0): There was no significant linear relationship between the number of people affected by floods and economic damages. The slope coefficient is equal to zero (𝛽 = 0).
Alternative Hypothesis (H1): There was a significant positive linear relationship between the number of people affected by floods and economic damages. The slope of coefficient is greater than zero (𝛽 > 0).

p-value <0.0000000000000002 < α-value(=0.05)

Therefore, H0 is rejected, indicating the positive linear relationship between the number of people affected by floods and total economic damages was highly significant.

2.3 Continent-Specific Analysis

Code
# Loading necessary libraries
library(DT)  # For rendering interactive tables

# Grouping and summarizing data by continent
# This step filters data for the specified continents and sums the number of people affected by floods.
continent_summary <- clean_data %>%
  filter(Entity %in% c("Asia", "Africa", "Europe", "North America", "South America", "Oceania")) %>%  # Filter for relevant continents
  group_by(Entity) %>%  # Group data by continent
  summarise(Total_Affected = sum(Number.of.people.affected.by.floods, na.rm = TRUE))  # Summing the number of affected people, handling missing values with na.rm = TRUE

# Rendering the table
# datatable() is used to display the summarized data in an interactive, paginated table.
datatable(
  continent_summary,  # Data to display
  options = list(pageLength = 5, autoWidth = TRUE),  # Display options: set page length to 5 and enable auto width adjustment
  caption = 'Table: Total number of people affected by floods across continents'  # Caption for the table
)

According to the table, Asia was the most heavily impacted continent from 1900-2010, with the highest number of affected people. From there, it shows that large, densely populated areas, especially in coastal zones and low-lying river plains, are highly vulnerable to flood risks[1]. The concentration of populations in these flood-prone regions increases their susceptibility to significant flooding impacts.

2.4 Case Study: Asia (1960–2000)

Code
# Loading necessary libraries for data visualization and manipulation
library(ggplot2)  # For creating visualizations
library(dplyr)  # For data manipulation using the tidyverse
library(plotly) # For interactive plot
# Filtering data for Asia between the years 1960, 1970, 1980, 1990, and 2000
# This selects relevant data for specific years and for Asia only.
asia_data <- clean_data %>%
  filter(Entity == "Asia", Year %in% c(1960, 1970, 1980, 1990, 2000))  # Filter the data by continent and year

# Creating a plot to visualize the relationship between total economic damages and the number of people affected by floods
p = ggplot(asia_data, aes(x = Year)) +  # Plotting Year on the x-axis
  geom_line(aes(y = `Total.economic.damages.from.floods`, color = "Total Economic Damages (USD)"), 
            size = 1.2, linetype = "solid") +  # Plot solid line for economic damages
  geom_line(aes(y = `Number.of.people.affected.by.floods`, color = "Number of People Affected"), 
            size = 1.2, linetype = "dashed") +  # Plot dashed line for number of people affected
  geom_point(aes(y = `Total.economic.damages.from.floods`, color = "Total Economic Damages (USD)"), size = 3) +  # Adding points for clarity
  geom_point(aes(y = `Number.of.people.affected.by.floods`, color = "Number of People Affected"), size = 3, shape = 4) +  # Add points for affected people
  labs(title = "Figure 2c. Number of Affected People 
              and Total Economic Damages in Asia (1960-2000)",
       x = "Year", # Label for the x-axis
       y = "", # Label for the y-axis
       color = "") +
  theme_minimal() + # Using a minimal theme for a clean aesthetic
  scale_y_continuous(labels = scales::comma) + # Formatting y-axis with commas for large numbers
  scale_color_manual(values = c("Total Economic Damages (USD)" = "blue", "Number of People Affected" = "red"))

#Make the plot interactive
l = ggplotly(p)
l

Asia should be prioritized for further analysis because of its highest number of affected individuals. The graph shows a clear pattern: as the number of people affected by floods decreased, total economic damages also tended to reduce, especially visible from the peak in 1990 to 2000. This highlights that strategies like flood predictions and timely warnings could significantly lower economic losses by reducing the number of people affected.[2]

2.5 Data Limitations

  • The raw data came from the past, covering the years 1960-2010, so the analysis may not accurately reflect the current situation.
  • Missing values: Incomplete surveys from some countries pose challenges for accurate estimations.

2.6 Articles

  1. Jun Rentschler & Melda Salhab. (2020). 1.47 billion people face flood risk worldwide, over a third of it could be devastating. World Bank Blogs. Retrieved from https://blogs.worldbank.org/en/climatechange/147-billion-people-face-flood-risk-worldwide-over-third-it-could-be-devastating
  2. Department of Planning and Environment. (2022). Flood risk management measures. NSW Government. Retrieved from https://www.environment.nsw.gov.au/-/media/OEH/Corporate-Site/Documents/Water/Floodplains/flood-risk-management-measures-220056.pdf

3. Appendix

3.1 Client Choice

The IFRC Flood Resilience Program aims to improve resilience in vulnerable communities, aligning with the report’s insights on how floods impact populations and economic damages to support mitigation strategies.

3.2 Statistical Analysis

3.2.1 Linear Modelling

Code
# Rounding the value to 2dp
round(correlation, 2)
[1] 0.74

Despite non-random distribution of residual plot (Figure 2.b), the linear model remains useful, showing a correlation of 0.74 between the two variables.

3.2.2 Hypothesis Testing

Hypothesis:

  • H0: No significant linear relationship exists.
  • H1: A significant linear relationship exists.

Assumptions:

  1. Independence: Data from different countries/years suggests independence.

  2. Normality: : Not satisfied, as Q-Q plot shows deviation, especially for quantiles ≥ 2 or ≤ 2.

Code
# Load necessary libraries
library(ggplot2)

# Fit the linear model
model <- lm(Total.economic.damages.from.floods ~ Number.of.people.affected.by.floods, data = data)

# Extract residuals from the model
residuals <- residuals(model)

# Create a data frame with residuals for plotting
residuals_df <- data.frame(residuals = residuals)

# Generate a pretty Q-Q plot using ggplot2
ggplot(residuals_df, aes(sample = residuals)) +
  stat_qq(color = "red", size = 2) +  # Points are dark blue and slightly larger
  stat_qq_line(color = "blue", linetype = "dashed", size = 1) +  # Reference line is dashed and red
  labs(title = "Q-Q Plot of Residuals", x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal(base_size = 15) +  # Use a minimal theme with larger base font size
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),  # Center and bold the title
    axis.title = element_text(face = "bold"),  # Bold the axis titles
    panel.grid = element_line(size = 0.5, color = "gray80")  # Light gray grid lines
  )

  1. Homoscedasicity: Residual plot is not homoscedastic .
Code
# Create residual plot
ggplot(model, aes(x = .fitted, y = .resid)) + # Plotting fitted values and residuals
  geom_point(color ="red") + # Setting the points to red for scatterplot
  geom_hline(yintercept = 0, linetype = "dashed", colour = "blue") + # Horizontal line at 0
  labs(title = "   Residual plot for Number of People Affected and 
   Economic Damages",
       x = "Fitted values",
       y = "Residual values") +
  scale_x_continuous(labels=scales::comma) + #Format x-axis with commas
  scale_y_continuous(labels = scales::comma) #Format y-axis with commas

  1. Linearity: The scatterplot looks linear.
Code
# Produce a regression line
ggplot(clean_data, aes(x = Number.of.people.affected.by.floods , y =Total.economic.damages.from.floods)) +
  geom_point()+   # Setting the points to red for scatterplot
  geom_smooth(method = "lm", se =FALSE) +
  labs(title = " Scatterplot with regression line for Number of People Affected 
 and Economic Damages",
       x = "Number of people affected",
       y = "Total economic damages (USD)") +
  scale_x_continuous(labels = scales::comma) + # Format x-axis with commas
  scale_y_continuous(labels = scales::comma) + # Format y-axis with commas
  theme_minimal()

Code
# Obtain a summary of the model
summary(model)

Call:
lm(formula = Total.economic.damages.from.floods ~ Number.of.people.affected.by.floods, 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-7054942   -86063   -85844   -85122 28348291 

Coefficients:
                                        Estimate   Std. Error t value
(Intercept)                         85843.659821 40265.168385   2.132
Number.of.people.affected.by.floods     0.156255     0.004199  37.211
                                               Pr(>|t|)    
(Intercept)                                      0.0332 *  
Number.of.people.affected.by.floods <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1337000 on 1122 degrees of freedom
  (480 observations deleted due to missingness)
Multiple R-squared:  0.5524,    Adjusted R-squared:  0.552 
F-statistic:  1385 on 1 and 1122 DF,  p-value: < 0.00000000000000022

T = 37.211 p-value < 0.0000000000000002 < 0.05 → Rejects H0

Therefore, there was a significant positve linear trend between the two variables. However, 2/4 assumptions are violated, so other factors may affect the results and make it invalid.

3.2.3 Line chart

A line chart showing the variation of the two variables in Asia helps the client easily observe their relationship accross many years, highlighting the significant decline in both variables after peaking in 1990.

3.3 Limitations

  • Missing values
  • The data was outdated.
  • The linear model didn’t account for factors like reconstruction costs and business disruptions affecting economic damages.

4. Ethics Statement

https://isi-web.org/declaration-professional-ethics

4.1 Shared Values: Truthfulness and Integrity

The statistical results presented are based on rigorous analysis of the available data, free from external influence, and the methodologies used are clearly explained to ensure reproducibility. The report have made a conscientious effort to present data honestly, highlighting strengths and acknowledging limitations of the analysis, such as outdated data or external factors on the linear relationship.

4.2 Ethics Principles: Pursuing Objectivity

This report adheres to the principle of Pursuing Objectivity by selecting methods that ensure accurate and timely results, particularly focusing on minimizing economic losses due to floods. The data used has been carefully curated, with missing values removed to enhance the reliability of the analysis. All findings, including correlation coefficients and regression models, are presented

5. Acknowledgements

Resources:
ED post links

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions

ChatGPT sessions